On Quadratic Interpolation of Image Cross-Correlation for Subpixel Motion Extraction

Digital image correlation techniques are well known for motion extraction from video images. Following a two-stage approach, the pixel-level displacement is first estimated by maximizing the cross-correlation between two images, then the estimation is refined in the vicinity of the cross-correlation peak. Among existing subpixel refinement methods, quadratic surface fitting (QSF) provides good performances in terms of accuracy and computational burden. It estimates subpixel displacement by interpolating cross-correlation values with a quadratic surface. The purpose of this paper is to analytically investigate the QSF method. By means of counterexamples, it is first shown in this paper that, contrary to a widespread intuition, the quadratic surface fitted to the pixel-level cross-correlation values in the neighborhood of the cross-correlation peak does not always have a maximum. The main contribution of this paper then consists in establishing the mathematical conditions ensuring the existence of a maximum of this fitted quadratic surface, based on a rigorous analysis. Algorithm modifications for handling the failure cases of the QSF method are also proposed in this paper, in order to consolidate it for subpixel motion extraction. Experimental results based on two typical types of images are also reported.


Introduction
Computer vision techniques for motion extraction are widely developed in a huge variety of applications, including motion tracking, motion compensation, image registration [1], remote sensing [2], biomedicine [3], satellite imagery [4] and vibration analysis [5]. Within this scope, techniques of digital image correlation (DIC) are known to provide accurate results with a high computational efficiency, along with good robustness against noises. Various variants of DIC exist in the literature, for example, phase-only correlation (POC) [2], upsampling cross-correlation (UCC) [6], Fourier-based correlation [1] and virtual image correlation [7].
In this paper, a well-known image correlation technique for subpixel motion extraction is analytically investigated. Subpixel accuracy is particularly important for video-based structural health monitoring (SHM) [5,8] and for aerial or satellite imagery [4]. In such applications, the region of interest may be represented by a small number of pixels in the captured images. It is then important to extract both multipixel and subpixel motions information from video images.
A large variety of correlation-based techniques for motion extraction with subpixel accuracy have been compared in [1,2]. The most widespread techniques are performed in a two-step process. At the pixel level, displacement is estimated by maximizing the crosscorrelation between two images. To achieve subpixel accuracy, displacement estimation is then refined in the vicinity of the cross-correlation peak. Among such refinement methods, quadratic surface fitting (QSF) provides a good trade-off between accuracy and computational burden, particularly suitable for video-based SHM, as reported in [9]. This method and its variant forms have also been investigated in [1,2,[10][11][12][13][14]. Its good performance has been confirmed by our own experiments.
The purpose of this paper is to mathematically analyze the QSF method. By means of counterexamples, it will be shown that, contrary to a widespread intuition, the quadratic surface fitted to the cross-correlation values in the 3 × 3 pixels neighborhood of the correlation peak does not always have a maximum, despite the fact that the maximum pixel-level cross-correlation value is located at the center of this 3 × 3 pixels neighborhood. This absence of maximum leads to a failure of the QSF method, which should determine the subpixel displacement by maximizing the fitted quadratic surface interpolating the crosscorrelation values in the vicinity of the cross-correlation peak. However, experiences reported by different authors and conducted by ourselves show that usually the QSF method produces satisfactory results. Then it is important to understand the conditions under which this method works correctly. In this paper, the conditions ensuring the existence of a maximum of the fitted quadratic surface will be formally analyzed. Then these conditions will be completed to make sure that the maximum is within the one-pixel vicinity of the pixel-level cross-correlation peak. Solutions will also be proposed to handle the failures cases of the QSF method by constrained optimization ensuring that the estimated subpixel displacement is within one pixel. These modifications apply only when a failure occurs, hence the extra numerical computation cost is insignificant.
This paper is an extended version of the conference paper [15]. It is organized as follows. The considered problem is formulated in Section 2. The QSF method is recalled in Section 3. Examples showing failures of the QSF method are presented in Section 4. The QSF method is then analyzed in Section 5. Handling of the failure cases is proposed in Section 6. Experimental results based on two typical types of images are reported in Section 7. Finally, conclusions are drawn in Section 8.

Problem Statement
A locally rigid moving object is observed with a camera. It is assumed that the displacement of the observed object is small between successive images, with negligible rotation and negligible motion along the optical axis direction. Correlation processing will focus on a rectangular template, also known as region of interest (ROI), which includes either the whole moving object or some part of the object. The intensity of each pixel at instant t is denoted by I(n, m, t) where the integer pair (n, m) indicates the position of the pixel in an image.
Instead of the usual row and column indexes, in this paper the pair (n, m) denotes pixel integer coordinates in a Cartesian system, as illustrated in Figure 1. It will serve both to describe pixel positions in an image and to fit quadratic surfaces in the QSF method. In the second usage, the origin (x, y) = (0, 0) corresponds to zero subpixel displacement. This notation choice is more suitable for the surface fitting problem formulated in the Cartesian coordinate system, in agreement with usual mathematical notations.
Motion extraction will be carried out by determining the horizontal and vertical shifts of the selected template between two images captured at instants t and t + ∆t. The horizontal image shift (in number of pixels) is decomposed into an integer partñ and a fractional (subpixel) part x with |x| < 1, and similarly the vertical image shift is decomposed into an integer partm and a fractional part y with |y| < 1, so that: Total vertical shift =m + y.
Given two images (frames) captured at time instants t and t + ∆t, as illustrated in Figure 2, template shifts are usually estimated through a two-step process [9]. The pixel level (integer) shifts (ñ,m) are first estimated by maximizing the cross-correlation between the two image templates. At the second step, the cross-correlation is somehow interpolated in the vicinity of the cross-correlation peak to estimate the subpixel shifts (x, y).
where T denotes the set of integer pairs (n, m) corresponding to the pixels belonging to the considered template. The dependences on I(n, m, t) and on I(n, m, t + ∆t) are omitted in the notation r(ñ,m) for a lighter presentation. The search for the cross-correlation peak is formulated as: whereÑ andM are two positive integers specifying the search ranges respectively for horizontal and vertical shifts. In order to gain subpixel accuracy, at the second step, the cross-correlation r(ñ,m) is somehow interpolated for non integer shifts so that the correlation maximization (3) can be generalized to subpixel shifts.
In the QSF method, this interpolation is made by fitting, in the least squares sense, a second degree polynomial (or, geometrically, a quadratic surface) to the value of r(ñ * ,m * ) and to the 8 neighboring cross-correlation values r(ñ,m), namely r(ñ * + n,m * + m) with n, m ∈ {−1, 0, +1}. Then the maximum of the fitted polynomial yields the estimated subpixel shifts between the two templates [9].
More formally, the integer shifts (ñ * ,m * ) being already estimated, let p(x, y) denote the second degree polynomial fitted, in the least squares sense, to r(ñ * + x,m * + y) for x, y ∈ {−1, 0, +1}. Then, the subpixel shifts are estimated as and the estimated total shifts amount to: Satisfactory experimental results of this method have been reported by different authors, for example, [1,2,9]. Our own experiments also confirm its good performance compared to other existing methods for subpixel shift estimation. The main purpose of this paper is to consolidate the theoretical basis of this method.
More specifically, the QSF method, as recalled above, assumes implicitly that the second degree polynomial p(x, y) fitted to the nine correlation values r(ñ * + n,m * + m), with n, m ∈ {−1, 0, +1}, always has a unique global maximum, corresponding to (x * , y * ) located in the one-pixel vicinity of (x, y) = (0, 0), so that the total shifts as expressed in (5) do not fall too far from the pixel level optimal shifts (ñ * ,m * ). This paper will investigate the following issues.

1.
Does the quadratic surface fitted in the QSF method always have a maximum in the one-pixel vicinity of the pixel-level cross-correlation peak? 2.
If the answer to the first question is no, what are the conditions ensuring that the fitted quadratic surface has a maximum, and moreover, the maximum is located in the one-pixel vicinity of the pixel-level cross-correlation peak? 3.
What should the algorithm do if the fitted quadratic surface ever has no maximum, or if its maximum is outside the one-pixel vicinity of the pixel-level cross-correlation peak?

Quadratic Surface Fitting for Subpixel Refinement
The QSF method is recalled in this section before its analysis in the next sections. Let (ñ * ,m * ) be resulting from the pixel level maximization (3). The nine integer pairs (n, m), with n, m ∈ {−1, 0, +1}, form a 3 × 3 grid (The grid G is formed by nine integer pairs organized in three rows and three columns. It is also seen as a set with the integer pairs as elements, so that notations like (−1, 1) ∈ G can be used.) : Accordingly, the nine cross-correlation values r(ñ * + n,m * + m) normalized by the maximum cross-correlation value r(ñ * ,m * ) and denoted by: The central entry of Γ is the maximum cross-correlation value r(ñ * ,m * ) normalized by itself, hence this central entry γ(0, 0) = 1 is the maximum value among all the nine entries of Γ. The second degree polynomial: with the vector θ ∈ R 6 collecting the scalar coefficients θ 1 , . . . , θ 6 , is then fitted to the entry values of Γ for (x, y) ∈ G, by solving the least squares problem: where G is the grid defined in (6). Does this fitted second degree polynomial p θ (x, y) always have a unique global maximum?
As explained above, the central entry of the matrix Γ, namely γ(0, 0) = 1, is the maximum value among all the nine entries of Γ. It then seems reasonable to expect that the second degree polynomial (or quadratic surface) p θ (x, y) fitted to the nine entries of Γ has a maximum somehow close to the (maximum) central entry of the matrix Γ, which corresponds to the origin (x, y) = (0, 0) of the coordinate system characterizing the fitted quadratic surface.
Unfortunately, the fact that the central entry γ(0, 0) is the maximum value among the nine entries of Γ does not really ensure that the fitted second degree polynomial p θ (x, y) always has a global maximum, as demonstrated by the following counterexamples.

Counterexamples
Three examples with either synthetic or real-world images are presented below to show that the quadratic surface fitted in the QSF method does not always have a maximum in the close vicinity of the pixel-level cross-correlation peak.

Example 1
The first counterexample with synthetic images was chosen for its simplicity so that it can be easily reproduced. The Matlab code for generating the presented result is available for download [16]. To show the robustness of this counterexample, the Matlab code can be optionally run with some random noises added to the generated synthetic images, though the result presented below is noise-free. Counterexamples based on true images (as presented in Examples 2 and 3) are also included in [16].
Consider two binary images of 18 × 18 pixels as shown in Figure 3, with the template (ROI) chosen as the red square window of 8 × 8 pixels in each image. The template in the first image contains a diagonal pattern. In the second image, this diagonal pattern is shifted by one pixel toward the right and also by one pixel toward the bottom. In these binary images, the intensity is 1 at the darker pixels and 10 at the brighter pixels. The normalized 3 × 3 cross-correlation matrix around the peak as defined in (8) Fitting the second degree polynomial p θ (x, y) to Γ for (x, y) ∈ G by solving the least squares problem (11) yields the solution: The corresponding quadratic surface p θ * (x, y) exhibits a saddle point, as illustrated in Figure 4. It has no global maximum, despite the fact that the central entry of the matrix Γ is its largest entry. This counterexample clearly invalidates the widespread intuition that the fitted polynomial p θ (x, y) in the QSF method always has a global maximum.

Example 2
This example is based on true images of the moon, included in Matlab as a part of its image examples. The Matlab code generating the presented results are available for download [16]. Two images of the moon with 537 × 358 pixels are shown in Figure 5. As in Example 1, with some chosen template (ROI), cross-correlations between the two images are computed, and a quadratic surface is fitted to the normalized 3 × 3 cross-correlation matrix around the peak. In most situations, the fitted surface has a maximum close to the pixel-level cross-correlation peak, but unexpected cases do happen with some particular template choices.
For the chosen template illustrated by the red windows in Figure 5, the fitted quadratic surface is shown in Figure 6. The surface exhibiting a saddle point has no maximum. This result can be reproduced with the Matlab code downloadable from [16].

Example 3
This example is based the same images of the moon as in Example 2, but with another choice of the template for cross-correlation computation, as illustrated by the red windows in Figure 7. The fitted quadratic surface shown in Figure 8 has a maximum located at (x * , y * ) = (13.76, 4.28), as indicated by the cyan dot on the surface. This maximum is far outside the square area of the one-pixel vicinity of the pixel-level cross-correlation peak delimited by the red dotted lines on the bottom plane in Figure 8. As a subpixel refinement, the maximum of the fitted surface should satisfy |x * | < 1 and |y * | < 1, which is not the case in this example. The details of this result are available from [16].
The examples presented in this section confirm that, in some rare situations, the quadratic surface fitted in the QSF method does not have a maximum, or has a maximum far away from the square area of the one-pixel vicinity of the pixel-level cross-correlation peak.
After the answer to the first question raised at the end of Section 2, the following sections will then answer the other two remaining questions.   Figure 7, the fitted quadratic surface has a maximum located at (x * , y * ) = (13.76, 4.28), as shown by the cyan dot on the surface. The red dotted lines on the bottom plane delimit the square area of the one-pixel vicinity around the pixel-level cross-correlation peak. As a subpixel refinement, the maximum of the fitted surface should satisfy |x * | < 1 and |y * | < 1, which is not the case in this example. See [16] for the details of this example.

Conditions for the Existence of a Maximum
In elementary algebra, it is well known that the second degree polynomial p θ (x, y) as expressed in (10) has a unique global maximum if its Hessian matrix, is negative definite [17]. However, this simply stated fact does not directly help us to understand how the normalized cross-correlation values γ(n, m) (those filling up the Γ matrix in (8)) should be, so that the fitted p θ (x, y) has a global maximum. Because the polynomial coefficients θ 1 , . . . , θ 6 are determined from the values of γ(n, m) by solving the least squares problem (11), it is straightforward to express the negative definiteness condition of H in terms of γ(n, m). Then, in principle, the condition for the existence of a global maximum of p θ (x, y) will be formulated in terms of the normalized cross-correlation values γ(n, m). Nevertheless, this approach results in sophisticated conditions, notably an inequality involving the determinant of H expressed in terms of γ(n, m). For a better understanding, the result presented below will be formulated with simple and easily interpretable inequalities about the values γ(n, m) filling up Γ. For instance, one of these simple inequalities states that the central entry γ(0, 0) of Γ is its largest entry. As shown by the previously presented counterexamples, this condition alone is not sufficient. It is then completed by similar simple inequalities. Theorem 1. If the normalized cross-correlation values γ(n, m) filling up the matrix Γ satisfy: γ(0, m) > γ(n, m) for all n, m ∈ {−1, +1} γ(n, 0) > γ(n, m) for all n, m ∈ {−1, +1}, then the second degree polynomial p θ (x, y) fitted to the entries of Γ by solving the least squares problem (11) has a unique global maximum.
Interpretation of the conditions of Theorem 1. Proof of Theorem 1. In order to shorten lengthy equations and inequalities, let us introduce more compact notations for the normalized cross-correlation values γ(m, m) filling up the matrix Γ defined in (8), so that Γ is rewritten as: Remark that the letters a, b, d, . . . , i fill Γ first at the four corners, then the middles of side rows and columns, before finishing at the central entry.
With these compact notations, the least squares solution (11) leads to As already mentioned in this paper, the negative definiteness of the Hessian matrix H defined in (14) ensures that the polynomial p θ (x, y) has a global maximum. Based on Sylvester's criterion (Usually Sylvester's criterion [18] is about the positive definiteness of a real symmetric (or complex Hermitian) matrix. It is trivial to translate this criterion to the case of negative definiteness; this negative definiteness will be checked through: According to the inequalities assumed in (16), e (or h, resp.) is the largest entry of the top (or bottom, resp.) row of Γ, then and according to (15), i is the largest entry of Γ, then These inequalities together with (19d) immediately imply (20). It is more involved to check (21). The inequalities in (22) imply: Repeat the reasoning from (25) to (27) while interchanging the positions of (e − a) and (e − b): leading to Combining (27) and (29) yields This result expresses a relationship between the entries in the top row of the matrix Γ. A similar reasoning then leads to the following relationship between the entries in the bottom row of Γ: According to the inequalities assumed in (15), i is the largest entry of Γ, then Take the sums of the respective sides of the four inequalities in (30)-(32), then This result then implies that −6θ 4 and 4θ 5 , as expressed respectively in (19d) and (19e), satisfy: Following the same approach, it is then similarly shown that This last result can also be deduced from a certain "symmetry" between the formulae expressing θ 4 and θ 6 in (19d) and (19f).
It then follows from (14), (36) and (37) that Therefore, the two inequalities (20) and (21) ensuring the negative definiteness of H are successfully checked. It is then established that the second degree polynomial p θ (x, y) has a unique global maximum.
When the fitted polynomial p θ (x, y) has a unique global maximum, it may happen that this maximum is far away from the origin (x, y) = (0, 0) corresponding to the optimized integer shifts (ñ * ,m * ), outside the square area of the one-pixel vicinity of the cross-correlation peak. Such situations are not desirable, since the subpixel refinement should not modify the estimated shifts by more than one pixel. The following result ensures that the maximum of p θ (x, y) stays inside the one-pixel vicinity, under easily interpretable conditions. Theorem 2. If, in addition to the conditions of Theorem 1, the normalized cross-correlation values γ(n, m) filling up the matrix Γ satisfy, for all n, m ∈ {−1, +1}, then the maximum of the fitted polynomial p θ (x, y) is located at (x * , y * ) such that |x * | < 1 and |y * | < 1.
Interpretation of the conditions of Theorem 2. The conditions inherited from Theorem 1 ensure that the middle entry in each row or column of Γ is the largest entry of the row or column, without imposing any "degree of symmetry". For example, among the top row of Γ as expressed in (8), inequalities formulated in (16) ensure that γ(0, 1) is the largest entry, but the ratio [γ(0, 1) − γ(−1, 1)/[γ(0, 1) − γ(1, 1)] can be any positive number. Two of the inequalities in the extra condition (39) of Theorem 2 constrain this ratio between 1/5 and 5, thus limiting the dissymmetry between γ (−1, 1) and γ(1, 1).

Proof of Theorem 2.
Based on Theorem 1 (its conditions are inherited here), the fitted polynomial p θ (x, y) has a unique global maximum in R 2 , which is located at In order to prove |x * | < 1, it will be shown that the numerator of |x * | is smaller than its denominator. The proof for proving |y * | < 1 will be made similarly.
With the compact notations filling up the matrix Γ introduced in (18) for the normalized correlation values γ(n, m), one of the inequalities contained in (39), namely is translated into Because i is the largest entry of Γ (condition inherited from Theorem 1), Summarizing (51), (52) and (55) then implies that θ 4 as expressed in (19d) satisfies: This result, together with (19e), then yields to: Some similar reasonings (due to a certain "symmetry" between θ 4 and θ 6 in (19)) then lead to −θ 6 > |θ 5 | ≥ 0. (60) Add together the respective sides of (19b) and (19d), then, Every parenthesis at the right hand side of (61) is positive, due to conditions inherited from Theorem 1. Then, Similarly, Based on the last 2 inequalities, −θ 4 satisfies: In the same manner, combining (19c) and (19f) yields: The following steps of the proof will be essentially based on: respectively, due to (59), (60), (65) and (68). On the one hand, (71), (72) and (69) imply (the step leading to (74) below) and on the other hand, (69) and (70) lead to (the step leading to (78) below) Therefore It is then concluded that x * as expressed in (41) satisfies |x * | < 1.
In the same way, it is also proved that: The proof of Theorem 2 is thus completed.

Handling Failures of the QSF Method
There are two possible failure cases: • Case 1, the fitted polynomial p θ (x, y) has no global maximum; • Case 2, p θ (x, y) has a global maximum, reached at (x * , y * ), but |x * | ≥ 1 and/or |y * | ≥ 1.
In the first case, the only reasonable proposition is to retain the optimized integer shifts (ñ * ,m * ) as the estimated total shifts.
In the second case, solutions on the boundary satisfying |x * | ≤ 1 and |y * | ≤ 1 are accepted. If |x * | > 1 and/or |y * | > 1, then the subpixel shifts are estimated by solving the constrained optimization problem: and the estimated total shifts amount to: (ñ * + x ,m * + y ).
As p θ (x, y) is a second degree polynomial, the constrained optimization problem (83) can be solved by quadratic programming algorithms. In this considered Case 2, the unique unconstrained maximum of p θ (x, y) is outside the square area constrained by |x| ≤ 1 and |y| ≤ 1, hence the constrained solution is certainly on the boundary of the constrained square area, on one of its sides or on one of its corners. Given the simplicity of this quadratic problem, instead of applying a general quadratic programming tool, the constrained optimization problem (83) can be solved as follows: • Compute the values of p θ (x, y) at the four corners of the square, namely p θ (±1, ±1); • Find the maximums of p θ (x, ±1) in x and p θ (±1, y) in y: Eliminate any result(s) not satisfying |x ± | ≤ 1 or |y ± | ≤ 1. • Find the maximum value among the four corner values p θ (±1, ±1) and the noneliminated values p θ (x ± , ±1) and/or p θ (±1, y ± ), if any. The solution of the constrained optimization problem (83) is then given by the location of this maximum.

Assessment Based on Two Typical Types of Images
The original QSF method and the modified method are tested with two typical types of images for failure rate evaluation. As already stated in the introduction, the original QSF method works correctly in most situations. The occurrence frequency of its failures depends on the type of processed images. Two typical examples are presented below in this section.
For the moon example already considered in Section 4, when the template window is close to a corner of the left side of the images, as shown in Figure 5, the selected region of interest has a diagonally dominant pattern, sharing a common characteristic with the synthetic example shown in Figure 3. For this reason, the QSF method is more likely to encounter difficulties with these images.
A second example comes from the publicly available DIC Challenge database [19,20] provided by the Society for Experimental Mechanics. The tested images contain uniform patterns, as shown in Figure 9.  [19], each with resolution 512 × 512 pixels. The the red windows correspond to an example of template for the QSF method.
For each trial with the original QSF method, the template window is placed at a different position in the processed images, as illustrated in Figures 5 and 9. Among all the trials, the number of cases where the fitted quadratic surface exhibits a saddle point (absence of maximum) is counted. The second type of failures, with maximum outside the one-pixel vicinity, is also counted. The results are summarized in Table 1. Total trials: number of total trials of the QSF method. Absence of maximum: number of trials yielding a saddle point (and in percentage of total trials). Maximum outside the one-pixel vicinity: number of trials yielding a maximum of the fitted surface located at (x * , y * ) with |x * | > 1 or |y * | > 1 (and ppm, part per million). max |x * |: the maximum value of |x * | among all the trials. max |y * |: the maximum value of |y * | among all the trials. As expected, the moon example leads to more failures (0.21% and 4.79% of the two types of failures over the total number of trials) due to the existence of diagonally dominant patterns. For the second type of failures, the maximum of the fitted surface can be far away from the the one-pixel vicinity, up to 52 pixels. On the other hand, the DIC Challenge example with uniform patterns has very few failures (8.74 ppm and 52.4 ppm).

Absence of Maximum
Though failures of the original QSF method rarely happen, it is important to prevent them for reliable applications.
After modifications of the QSF method as proposed in Section 6, for each the trials with the above two examples, the modified method successfully finds a subpixel displacement (x * , y * ) such that |x * | < 1 and |y * | < 1.

Conclusions
In this paper, within the scope of digital image correlation techniques, some theoretical aspects of the QSF method have been investigated. It has been shown that, contrary to a widespread intuition, the quadratic surface fitted in the QSF method does not always have a maximum. Then, for a better understanding of this method, it is analyzed by providing mathematical conditions ensuring expected results. Algorithm modifications have also been proposed to handle unexpected cases. Finally, experimental results based on two typical types of images have been reported. These results will contribute to consolidating both the theory and the practice of the QSF method.